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Abstract 

Biologists measure information in different ways. Neurobiologists and 
researchers in bioinformatics often measure information using information- 
theoretic measures such as Shannon's entropy or mutual information. Be- 
havioral biologists and evolutionary ecologists more commonly use decision- 
theoretic measures, such the value of information, which assess the worth 
of information to a decision maker. Here we show that these two kinds 
of measures are intimately related in the context of biological evolution. 
We present a simple model of evolution in an uncertain environment, and 
calculate the increase in Darwinian fitness that is made possible by infor- 
mation about the environmental state. This fitness increase — the fitness 
value of information — is a composite of both Shannon's mutual infor- 
mation and the decision-theoretic value of information. Furthermore, we 
show that in certain cases the fitness value of responding to a cue is exactly 
equal to the mutual information between the cue and the environment. In 
general the Shannon entropy of the environment, which seemingly fails to 
take anything about organismal fitness into account, nonetheless imposes 
an upper bound on the fitness value of information. 

1 Introduction 

Living organisms acquire, store, process, and transmit information — and as 
such, information is a central organizing principle in biological systems at every 
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scale from the digital coding in DNA to the long-range calls of cetaceans pQ. 
While information-theoretic measures such as entropy and mutual information 
|21 El 0] have been embraced in neurobiology and bioinformatics, these measures 
are less commonly used in behavioral biology and evolutionary ecology. 

The problem is that entropy and mutual information do not directly address 
information quality; they do not distinguish between relevant and irrelevant 
information. Thus decision theorists, economists, and behavioral biologists typ- 
ically measure information by considering its value: its effect on expected payoff 
or expected fitness |51 151 171151191 ITU] . 

Definition: The value of information associated with a cue or signal 
C is defined as the difference between the maximum expected payoff 
or fitness that a decision-maker can obtain by conditioning on C 
and the maximum expected payoff that could be obtained without 
conditioning on C. 

The disconnect between entropy and mutual information on one hand and 
the value of information on the other has long puzzled biologists in general and 
the authors of this paper in particular. Entropy and mutual information appear 
to measure information quantity while reflecting nothing about fitness conse- 
quences; the value of information measures fitness consequences but has nothing 
to do with the actual length or information quantity of a message. But early 
work in population genetics |1 111121 fT51 114) 1 and recent analyses of evolution in 
fluctuating environments |17l I18| hint at a possible relation between informa- 
tion and fitness. What is this relation? Information theorists since Kelly [B3] 
have observed that in special circumstances, information value and information- 
theoretic measures may be related. Here we argue that these special circum- 
stances are exactly those about which biologists should be most concerned: the 
context of evolution by natural selection. We address the question "how much 
is information worth to living organisms?" and show that the answer turns out 
to be a striking amalgam of mutual information and the decision-theoretic value 
of information. 

2 A basic model 

As evolutionary biologists, how should we measure the cost of uncertainty 2 or 
the value of information? We want to know how the information affects fitness, 

1 Indeed, Claude Shannon wrote a PhD thesis in population genetics before embarking on 
the work that launched the field of information theory I15II10I . 

2 Numerous studies in population ecology and genetics have shown that fitness and popu- 
lation growth in uncertain environments depend on the exact nature of the uncertainty; they 
depend both on the distribution of individual reproductive successes, and on the correlations 
in individual successes (reviewed in ref. 1201 ). One can capture this complexity by distinguish- 
ing between two types of uncertainty or risk 1211 . Idiosyncratic risk is independent of that 
faced by other individuals, whereas aggregate risk is perfectly correlated among individuals. 
For example, predation imposes largely idiosyncratic risk on a herd of herbivores, whereas 
drought imposes largely aggregate risk. In this paper, we focus exclusively on aggregate risk. 
We will address mixed aggregate and idiosyncratic risk in a subsequent report. 
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so the natural measure of the worth of information is the following: The fitness 
value of information, G, is the greatest fitness decrement or cost that would be 
favored by natural selection in exchange for the ability to attain the information. 

Like stockbrokers and habitual gamblers, biological organisms faced with 
uncertain conditions are selected to behave as if they are concerned with long- 
term growth rates. Thus the fitness value of information to biological organisms 
is best measured in terms of the consequences of this information on the long- 
term growth rates of organismal lineages. Maximizing long-term growth in such 
conditions is the same as maximizing the expected value of the logarithm of the 
growth rate in a single generation [221 HHI ( as opposed to the expected value of 
the growth rate itself). 

To illustrate these results and to develop an intuition about the value of 
information in biological systems, consider the following simple model of a pop- 
ulation of annual organisms living in a variable environment 3 . The state of the 
environment in each year is an independent random variable $ with two states 
4>i and (j>2, that occur with probability p\, and p 2 = 1 — p\ respectively. All 
individuals encounter exactly the same environment in a given year. At the be- 
ginning of its development, each organism makes an important developmental 
decision to adopt one of two alternative phenotypes: one suited to environment 
4>i , or one suited to </> 2 . The organism survives to reproduce only if its phcno- 
type properly matches the demands of the current environment. The organism's 
fitness is given by the following matrix: 



What should these individuals do in the absence of information about the 
condition of the environment? In the short run, individuals maximize expected 
fitness by employing the highest-payoff phenotype only. This yields an expected 
single-generation fitness of max[pi W\,p2 u^]- 

But in the long run, playing only one strategy will inevitably lead to a year 
with zero fitness and consequent extinction. Thus natural selection will favor 
not the short run maximization above, but rather a maximization of long-term 
fitness. These organisms will be selected to hedge their bets during development 
|24l I25| . developing into phenotype 1 with some probability and phenotype 
2 otherwise 4 . As we consider a larger and larger span of generations, natu- 
ral selection is overwhelmingly likely [21] to favor the strategy that maximizes 
the growth rate for "typical sequences" @], in which environment <j>x occurs 
N pi times, and environment 02 occurs N pi times. For a genotype that de- 
velops with probability x into phenotype 1, the population growth over such 

3 In this section, we follow Cover and Thomas's (1991) presentation; these authors offer a 
parable about a habitual gambler who perpetually reinvests his entire winnings at the horse 
track. Their gambling story can be recast quite naturally as a model of organisms evolving 
by natural selection to match their physiologies to uncertain environmental conditions. 

4 Alternatively, organisms can hedge their bets via phenotypic switching, as with the bac- 
terial persistence phenotype 1201 1271 1 181 . 



Environment 4>i 
Environment <f>2 



Phenotype 1 Phenotype 2 
Wi 
w 2 
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a sequence of N events will be (wi j:) JVpi (w2(l — x)) Np2 and will be maxi- 
mized when N(pi log(wix) + P2 log(?«2(l — x))) is maximized. This occurs when 
Pi/x — ^2/(1 — x) or when x = p%. Thus for almost all sequences of envi- 
ronments, the strategy that develops into phenotype 1 with probability p\ will 
maximize the growth rate and thus take over the population. For this strategy, 
the expected log growth rate will be p\ log(wipi) + P2 log(w2P2)- 

We have set up a simple biological model where uncertainty critically affects 
fitness. What is the fitness value of information here? Suppose that individuals 
are able to detect a cue that they can use to forecast the state of the environ- 
ment with 100% accuracy. In this case the organism will use phenotype 1 in 
environment 1, and phenotype 2 in environment 2. What is the fitness value of 
[being able to obtain] this cue? 

First, we can look at how the cue improves the short-run expected fitness. 
With the cue, individuals can always develop the appropriate phenotype for the 
environment, and obtain short-run expected fitness piwi + p 2 w 2 . Thus in the 
short run, the expected value of information is piWi+p2W2~ max[pi Wi,p2 W2] — 
min[pi W\,p2 W2\- This is exactly the decision-theoretic value of information. 

But natural selection will not maximize short run expected fitness; instead 
as discussed above it maximizes the expected log fitness. Without the cue, the 
expected log growth rate is R no j n f = pilog(piWi) + P2 logfj^w^). With the 
cue it is i?; rl f = Pi logu>i + P2 logu> 2 . The fitness value of information G is the 
difference between growth with and without the cue, R\ n f — R no - m f and this 
quantity is exactly the mutual information between the perfectly informative 
cue and the environment, p\ logpi +P2 logp2- The payoffs Wi have dropped out. 
For this very simple example, the fitness value of information has nothing to 
do with the fitnesses wi and u>2 , but instead depends exclusively on the mutual 
information measure. 

This result generalizes naturally to cues that are only partially informative 
^5] El- If the cue is a random variable C, the fitness value of information will 
be the mutual information /($; C) = c p(4>, c) log p P ^p^ between the cue C 
and the state of the environment 4>. 

3 Two illustrative examples 

Thus far we have been looking at a very special case in which the fitness of the 
organism is zero when the wrong phenotype is adopted. A more realistic model 
would allow the possibility of non-zero fitness even when the organism develops 
to the wrong phenotype. 

Example 1 

We start with a two-environment, two-phcnotype example. Since the players 
have no control over the state of the environment, we can study the decision- 
theoretic behavior of the players without loss of generality using the following 
matrix where 1 > a > b: 
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Phenotype 1 Phenotype 2 
Environment <f>i 1 b 

Environment (f>2 a 1 

If the organism invests x in phenotype 1 and 1 — x in phenotype 2, her 
expected log growth rate will be p\og[x + a(l — x)] + (1 — p) log[6a; + (1 — x)]. 
In the absence of information about which environmental state is realized, the 
choice of x* (p) that maximizes expected log growth given the probability p of 
environment 1 is: 

r o i 0vp <i^ 

We see that the organism hedges bets only in a central region. Beyond 
that region, the optimal bet-hedging strategy would require the organism to 
produce one of the phenotypes with negative probability. This sort of investment 
may be feasible in a stock market or a horse race, but negative bets seem to 
lack a biological meaning. In biological situations, we do better to look at the 
constrained case where the organism must produce each phenotype with non- 
negative probability. 



[ Figure 1 about here ] 



If the organism responds to a cue C that gives the exact state of the environ- 
ment, she will match her phenotype to the environment always, for an expected 
log growth rate of log(l) = 0. The fitness value of information G is shown in fig- 
ure n In the central region "^."^ < P < rzTn; ! the fitness value of information 
is equal to the mutual information /($; C) between the (perfectly informative) 
cue C and the environment <&, plus a linear function of the probability of each 
environment. Outside the range, when the optimal strategy invests in only one 
of the phenotypes, the value of the cue is — p \og[a] or — (1 — p) \og\b] . This is sim- 
ply the decision-theoretic log value of information, i.e., the expected log of the 
value one would get if one took the decision-theoretic approach of maximizing 
fitness in one generation. 

Example 2 

To get a better intuition of how the fitness value of information relates to the 
cvolutionarily optimal strategy in the absence of information, we move to the 
case of 3 environments that occur with probabilities Pi,P2, and p$ = (1 — p\ —p 2 ) ■ 
While the principles generalize to larger numbers of environments and less- 
symmetric payoffs, three symmetric environments are far easier to represent 
graphically than are the more complicated alternatives. Thus we consider the 
following payoffs structure where a > 1: 
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Value G 




Figure 1: The fitness value of information (heavy solid curve) as a function 
of environmental probabilities p is a composite of three value functions: Curve 
(A) is the sum of mutual information between cue and environment and a linear 

function of the environmental probabilities:— (p log [p] + (1 — p) log[l — p]j — 

^log[l — ab] — (1 — p) log[l — a] — plog[l — b]j . Curves (B) and (C) arc the linear 

functions — plog[a] and —(1 — p) log[6] respectively. Parameter values: a = 0.65, 
b = 0.35. Simple calculus reveals that the despite being composed of a linear 
component and a logarithmic component, the fitness value of information G is 
not only continuous but also once continuously diffcrcntiable. 



G 




Figure 2: Fractional investment in each strategy in order to maximize long-term 
growth rate, as a function of the probabilities {pi,P2,P3) of each environment, 
for example 2 with a — 2. Because of the constraint pi + P2 + P3 = 1, we can 
represent the space of all possible environment probabilities (pi,f>2>f>3) as the 
two-dimensional simplex where one corner represents (1, 0, 0), another (0, 1, 0), 
and the third (0,0, 1). The height of the three surfaces at any point indicates 
the fractional investment in each strategy at that point. On the left, we have 
the unconstrained optimum, in which individuals may "bet against" certain 
phenotypes, investing negatively in them and putting the suplus into the other 
phenotypes. On the right is the constrained (and biological relevant) solution, 
in which the fraction invested in each phenotype must be non-negative. 



Phenotype 1 Phenotype 2 Phenotype 3 
Env. <f>i a 1 1 
Env. ^2 1 a 1 
Env. ^3 1 1 a 

Using the approach sketched out above, we can compute the fractional in- 
vestment xi,X2,%3 in each strategy that maximizes long-term growth rate: 



Xi 


= (Pl(l- 


h a) - P2 


-P3)/{a- 


1 


X2 


= (P2(l- 


ha)-p\ 


-P3)/{a- 


1 


X3 


= (P3(l- 


Va)-pi 


-P2)/{a- 


1 



[ Figure 2 about here ] 



This optimal strategy is shown in the left panel of Figure [21 Here we have 
a curious sort of investment; the gray surface is the "invest zero" plane. When 
the colored surfaces drop below this, the player is effectively betting against 
those phenotypes by producing them with negative probability — which makes 
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no biological sense, as discussed above. Our solution is then only reasonable in 
the central region where all three bets are non-negative. This area, which we 
will call Region 1, is delimited by pt > 1/(2 + a) for all i = 1,2, 3. Outside of 
Region 1, we will have to compute optimal bets subject to constraints that no 
bet is negative. We do this below and illustrate the result in the right panel of 
Figure |3 

When one environment is sufficiently infrequent but the other two are com- 
mon, an individual will invest in the phenotypes corresponding to the two com- 
mon environments but not in the rare one. There are three such regions on 
the simplex, with boundaries given by the trio of inequalities pi < 1/(2 + a), 
Pj < PkO-, and pk < Pja- In these three areas which collectively we call Region 
2, optimal allocation is given by 

Xi = 

Xj = (pja-pki/ia-l) (3) 
Xk = {PkO. — Pj)/(a — 1) 

Finally, when two environments are sufficiently rare, individuals will produce 
only the phenotype corresponding to the common environment. This occurs 
outside of the areas covered by Regions 1 and 2, in three corner areas which 
collectively we call Region 3. 

Because of the different betting strategies in each region, the value of in- 
formation in each region is computed by a different formula. We take these in 
turn. In Region 1, a cue indicating the state of the environment increases the 
expected log growth rate by 

Iog[o/(2 + a)} - P* l°gP* = log[o/(2 + a)} + /($, C) (4) 

This is simply a constant plus the mutual information between the environment 
and the (perfectly informative) cue. 

In Region 2, let I be the phenotype never adopted by the organism. Then 
the cue increases the expected growth rate by 

log a - ^pdogpi - (1 -Pi)log[(l + a)/(l -pi)] (5) 

In Region 3, let I be the phenotype always adopted by the organism. The 
cue increases the expected log growth rate by 

(l-pi)loga (6) 

This is simply the decision-theoretic log value of information, i.e., the log of the 
value one would get if one took the decision-theoretic approach of maximizing 
fitness in one generation. 

Putting these all together, we get the surface shown in Figure |3 
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p=(0,0,1) 
P=(1A0) 

| Region 1 I Region 2 | Region 3 



Figure 3: The fitness value of information G as a function of the probabilities 
that each environment occurs, for the symmetric three-environment scenario 
with a = 2, displayed on the simplex pi + P2 + P3 = 1. 
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[ Figure 3 about here ] 



Surprisingly, this fitness value of information surface seamlessly sews to- 
gether a region described by the mutual information (Region 1), a region de- 
scribed by the decision-theoretic value of information (Region 3), and an inter- 
mediate region (Region 2). Comparing the height of the surface and the gra- 
dients along the relevant edge and point boundaries, calculus reveals that this 
surface is again continuous and once continuously differentiable everywhere. 
The fitness value of information incorporates both the information-theoretic 
measure and the decision-theoretic value — and through the continuity of the 
corresponding regions, we also see a fundamental connection between these two 
measures. 



4 Extending the model 

Let us now assume that an organism has to make a developmental decision 
between n possible phenotypes, each of which is a best match to one of n 
environments. The environments fa occur with probabilities p% and the fitness 
of phenotype j in environment i is Wij . 

How should an organism respond? To maximize short-run expected fitness, 
the organism should simply develop the phenotype with the highest expected 
fitness. Expected fitness then will be E[w\ — maxj • 

What about long-term fitness? We can find a general form for the fitness 
value of information for those cases corresponding to Region 1 in the previous 
example, i.e., where the organism develops into all n phenotypes with positive 
probability. Let us look at a strategy that produces phenotype i with frequency 
Xi > 0. The organism will be selected to maximize the expected log growth 
rate R, so we want to find the strategy that maximizes the log growth rate 

Pi \og j w ij x j subject to the constraint that the fractional investments in 
the various phenotypes sum to one: J2i x i = 1- The Lagrangian for this problem 
is 

L(xi,x2 t ...,x n ,r) = y^pjlogy^ Xj - X^^Xi - l) (7) 

i j i 

Since the constraint function is a linear function, it immediately satisfies 
the constraint qualification that the partials of the constraint function at the 
constrained maximizcr are not all zero. We maximize the Lagrangian by taking 
partial derivatives and setting to zero. The partials with respect to x/, yield a 
set of n equations: 

d 

— Pt log w v x J - A ^ Xi] = (8) 
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Assuming that W (the matrix whose entry is Wy) is invertible, we can 

write yi = WijXj, and V — W^ 1 , so that Xj — J2 i Vjii/i. Then we can solve 

d 

— Pi log ^2 vj - A Y v ^ = ( 9 ) 

yk * 3 ij 

for all k, which gives for all k: 

Ei- X y2v jk = o (io) 

Now we can solve for the constraint ^ x i = 1 j which gives A = 1 and 
thus we have y k — Pk/J2j v jk- Though this solution for y k always exists, and 
the corresponding Xi always satisfy the constraint J2i x i = 1 ; a solution might 
contain negative Xi, which would not be biologically plausible. Thus the rest of 
the derivation will assume that we are in a region of parameter space in which 
the solution is indeed non-negative for all Xi . Substituting y k into the equation 
for the log growth rate, to get the maximal log growth rate without the cue, 
gives 



^no inf = Y Pi lQ g( V ' — ) = Y Pl lo gfe) ~Y Pi lo g£ v ii) ( U ) 
i ^3 i l i i j 

This expression is simply — iJ($) + L(p), where H(&) is the entropy of the 
environment and L is a linear function of the probabilities of each environmental 
state. 

We would now like to calculate the value of a cue. This will be the difference 
in the expected log growth rate between this optimal strategy, and the optimal 
strategy when a cue C is received. 

First we consider a cue that reveals the exact environment. The organ- 
ism will maximize fitness by matching phenotype to the known environment, 
yielding a log growth rate i?j n f = Y^iPi ^°z{ w n)- Thus the value of this cue is 

J = #inf - R no inf = - XI Pi 1o S(k) + Yl Pi log ( W " Y V ^ ^ 

i i j 

This is the mutual information between the (perfectly informative) cue and 
environment, /(<!>; C) — H(p) — plus a linear function of the probabilities pi. 

Next we assume that the cue does not reveal the exact state of the envi- 
ronment, but instead only contains partial information about the environment. 
Let the mutual information between the cue C and the environment be /($; C). 
The strategy of the organism will depend on the cue. We can thus maximize 
growth rate for each cue separately using the conditional probabilities of all 
environments Pr(</>|c), and the same argument we used above to calculate the 
optimal strategy. We used two assumptions there: first, that W is invertible, 
and this still holds, and second that in our solution all x k are non- negative. We 
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now assume that this is true for the responses to all cues, and limit the domain 
of our solution correspondingly, as discussed below. We can then compute the 
maximal growth rate by averaging equation 1111 over all cues, for a maximal 
growth rate of: 

Rw£ = E Pr ( c ) E Pr ^ c ) lo g(Pr(^)) - E Pr ( c ) E Pr ^ c ) lo s(E 

c c <t> j 

= ^^P^ C )log(PrWc))-^p,log£^) (13) 

c 4> i j 

This is — _ff(<i>|C) + L(p), where iJ($|C) is the conditional entropy of the 
environment given the cue. The fitness value of information G conferred by the 
cue is the difference in the growth rates: — H ($|C) + L(p) + H(Q) — L(p) = 
/(<!>; C). Thus G is simply the mutual information between the cue and the 
environment. 

Note that the fitness difference will be exactly /(<&; C) only when the organ- 
ism produces all phenotypes with positive probability both with and without the 
cue. In our calculation we assumed that for all cues, all Xi are positive. If the 
environmental probabilities are such that the organism hedges in the absence of 
a cue, then if the cue conveys sufficiently little information, the organism will 
also do best to hedge after receiving the signal as well, albeit with different frac- 
tions going into each phenotype. The reason is that for each particular signal c, 
we get a solution using conditional probabilities Pr(0|c) instead of the original 
probabilities Pr(0). If all the Pr(0|c) are sufficiently close to Pr(</>), they will 
fulfill the same requirements that Pr(0) fulfills, and thus there will be a solution 
with positive Xi for each signal In all such cases, the fitness gained from a signal 
will be exactly the mutual information between the signal and the environment. 
When we are outside this range, the gain from a signal will be lower relative 
to the mutual information, and can even be 0. For example, if without the 
signal no bet-hedging occurs, and all signals convey so little information that 
no decision is changed, then the gain in growth rate resulting from the signal 
will be 0. 

5 Bounding the fitness value of information 

We can also show that the fitness value of information is bounded above by the 
mutual information between cue and environment. Compare the expected log 
growth rate of individuals of two types. Type A individuals receive a cue C 
with possible values ci, C2, C3, . . . , c n drawn from a distribution with probability 
function Pr(C) and entropy H(C). Each individual then maximizes expected 
log growth rate by following some investment strategy s(c) that sets how to 
invest in the various phenotypes, given the receipt of cue c. 

Type B individuals do not receive this cue. Instead, they follow the betting 
strategy r = J2 C P r ( c ) s ( c )j thereby employing a probability-matching mixture 
of the various s(c) strategies used by Type A individuals. 
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Represent the fitness of an individual using strategy s(y) when the cue was 
c by w(s(y)\c). The expected log growth rate for Type A individuals is then 



ifc=X)Pr(c)]ogK«(c)|c)]. 

c 

The expected log growth rate for Type B individuals is 
R B =^Pr(c) log[^Pr(c')iw(s(c / )|c) 



(14) 



(15) 



Since fitnesses are non-negative, the w(s(c')\c) terms in the summation above 
must be at least zero even for c' ^ c, and therefore Y^' c Pr(c')w(s(c')|c) > 

Pr(c)w(s(c)|c). Since log is a monotone function, this implies log P T (c')w(s(c')\ 



log Pr(c)u>(s(c)|c) 



Thus: 



R B > ^Pr(c)log[Pr(c)w(s(c)|c)] 

C 

= Pr(c) \og[w{s{c)\c)} + Pr(c) log Pr(c) 

C C 

= R A -H{C). 



(16) 



Since a growth rate of at least Rb can be attained without information, the 
fitness value of information is therefore bounded by G < Ra — Rb < H(C) = 
/(<&, C) Thus the fitness value of information is at most equal to the mutual 
information between a perfectly informative cue and the environment, irrespec- 
tive of the actual fitness payoffs w. As before, this result can be generalized to 
partially informative cues 0]. 
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6 Discussion 

In this paper we have shown that two measures of information, the information- 
theoretic mutual information and the decision-theoretic value of information, 
are united into a single measure when one looks at the strategies that natu- 
ral selection will favor, namely those that maximize the long term growth rate 
of biological organisms. Furthermore, we have shown that under conditions in 
which bet-hedging is advantageous, and with cues that convey little informa- 
tion, the fitness value of information associated with those cues is exactly the 
mutual information between the cue and the environment. Finally, we have 
shown that the fitness value of an informative cue is bounded above by the mu- 
tual information between that cue and the environment, and in some cases is 
equal to exactly this value. These results establish a close relationship between 
biological fitness and information-theoretic measures such as entropy or mutual 
information. 
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But why does this relation exist? To answer that question, we should take 
a closer look at the concept of information: information is the reduction of 
uncertainty, where uncertainty measures the number of states a system might be 
in. Thus mutual information between the world and a cue is the fold reduction in 
uncertainly about the world after the cue is received. For example, if a system 
could be in any of six equiprobable states, and a cue serves to narrow the 
realm of possibility to just three of these, the cue provides a twofold reduction 
in uncertainty. For reasons of convenience, information is measured as the 
logarithm of the fold reduction in uncertainty — this ensures that the measure 
is additive, so that for example we can add the information received by two 
successive cues to calculate the total information gained 

Thus while information concepts are often thought to be linked with the fa- 
mous sum ^plog(p), the fundamental concept is not a particular mathematical 
formula. Rather, it is the notion that information measures the fold reduction 
in uncertainty about the possible states of the world. 

With this view, it is easy to see why information bears a close relation to bio- 
logical fitness: For simplicity, consider an extreme example in which individuals 
survive only if their phenotype matches the environment exactly, and suppose 
that there are ten possible environments that occur with equal probability. In 
the absence of any cue about the environment, the best the organism can do 
is randomly choose one of the ten possible phcnotypes with equal probability. 
Only one tenth of the individuals will then survive, since only a tenth will match 
the environment with their phenotype. If a cue conveys 1 bit of information and 
thus reduces the uncertainty about the environment twofold, the environment 
can be only in one of five possible states. The organism will now choose ran- 
domly one of five possible phenotypes, and now a fifth of the population will 
survive — a twofold increase in fitness, or a gain of 1 bit in the log of the growth 
rate. 

What happens when the environments are not equiprobable? In this case we 
can understand the connection between information and fitness by looking to 
long sequences of environments and the theory of typical sequences . The theory 
tells us that almost surely one of the "typical sequences" — those sequences in 
which the environments occur in their expected frequencies — will occur 0]. 
Moreover, all typical sequences occur with equal probability. Thus a lineage 
is selected to divide its members equally among all typical sequences. Since 
any one mistake in phenotype is lethal, only a fraction of these lineages, those 
that have just the right sequence, will survive. The number of typical sequences 
in this case is exactly 2 NH ^> where N is the number of generations in the 
sequence and H(Q) is the entropy of the environment. Correspondingly, the 
fraction of surviving lineages will be 2~ NH ^ . If a cue C is received that 
reduces the uncertainty of the environments by !(<!>; C), then the fraction of 
surviving lineages can be increased by exactly 2 NI ^' C \ This is analogous to 
the situation in communication: if we need to encode a string of symbols that 
are not equiprobable, we turn to a long sequence of such symbols. Our code then 
needs only to be efficient for representing typical sequences of symbols, and those 
typical sequences occur with equal probability. The number of such sequences is 
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2 , where N is the length and H is the entropy of the symbols. If the message 
recipient also obtains side information related to the message itself, then the 
mutual information /(message; side information) measures the reduction in the 
number of possible messages that need to be encoded by the transmitter. This 
number of messages is reduced by exactly 2 Ar/ ( messa S e ; sidc information)_ fold by 
the presence of the side information. 

We can now see why the concept of information is the same across different 
disciplines. In biology, fitness refers to the fold increase in the number of surviv- 
ing lineages. In communication theory, information refers to the fold increase in 
the number of messages to encode. In physics, entropy refers to the fold increase 
in the number of possible states in phase space. 

Finally, our results also suggest that information theory will be useful in 
studying the evolution of communication. Even before knowing what a biologi- 
cal signal means, how it is used, or what the fitness structure of the environment 
may be, we have shown that one can place an upper bound on the fitness con- 
sequences of responding to that signal, simply by measuring the information 
content of the signalling channel. 
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